NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Retrieval and Structuring Augmented Generation with LLMs for Web Applications

https://doi.org/10.1145/3701716.3715870

Jiao, Yizhu; Ouyang, Siru; Zhong, Ming; Zhang, Yunyi; Ding, Linyi; Zhou, Sizhe; Han, Jiawei (May 2025, ACM)

Full Text Available
Multimodal Search in Chemical Documents and Reactions

https://doi.org/10.1145/3726302.3730152

Shah, Ayush Kumar; Dey, Abhisek; Luo, Leo; Amador, Bryan; Philippy, Patrick; Zhong, Ming; Ouyang, Siru; Friday, David Mark; Bianchi, David; Jackson, Nick; et al (July 2025, ACM)

Full Text Available
Automated Mining of Structured Knowledge from Text in the Era of Large Language Models

https://doi.org/10.1145/3637528.3671469

Zhang, Yunyi; Zhong, Ming; Ouyang, Siru; Jiao, Yizhu; Zhou, Sizhe; Ding, Linyi; Han, Jiawei (August 2024, ACM)
Baeza-Yates, Ricardo; Bonchi, Francesco (Ed.)
Massive amount of unstructured text data are generated daily, ranging from news articles to scientific papers. How to mine structured knowledge from the text data remains a crucial research question. Recently, large language models (LLMs) have shed light on the text mining field with their superior text understanding and instructionfollowing ability. There are typically two ways of utilizing LLMs: fine-tune the LLMs with human-annotated training data, which is labor intensive and hard to scale; prompt the LLMs in a zero-shot or few-shot way, which cannot take advantage of the useful information in the massive text data. Therefore, it remains a challenge on automated mining of structured knowledge from massive text data in the era of large language models. In this tutorial, we cover the recent advancements in mining structured knowledge using language models with very weak supervision. We will introduce the following topics in this tutorial: (1) introduction to large language models, which serves as the foundation for recent text mining tasks, (2) ontology construction, which automatically enriches an ontology from a massive corpus, (3) weakly-supervised text classification in flat and hierarchical label space, (4) weakly-supervised information extraction, which extracts entity and relation structures.
more » « less
Full Text Available
ActionIE: Action Extraction from Scientific Literature with Programming Languages

https://doi.org/10.18653/v1/2024.acl-long.683

Zhong, Xianrui; Du, Yufeng; Ouyang, Siru; Zhong, Ming; Luo, Tingfeng; Ho, Qirong; Peng, Hao; Ji, Heng; Han, Jiawei (January 2024, Association for Computational Linguistics)

Full Text Available
Learning theory for inferring interaction kernels in second-order interacting agent systems

https://doi.org/10.1007/s43670-023-00055-9

Miller, Jason; Tang, Sui; Zhong, Ming; Maggioni, Mauro (June 2023, Sampling Theory, Signal Processing, and Data Analysis)

Abstract Modeling the complex interactions of systems of particles or agents is a fundamental problem across the sciences, from physics and biology, to economics and social sciences. In this work, we consider second-order, heterogeneous, multivariable models of interacting agents or particles, within simple environments. We describe a nonparametric inference framework to efficiently estimate the latent interaction kernels which drive these dynamical systems. We develop a learning theory which establishes strong consistency and optimal nonparametric min–max rates of convergence for the estimators, as well as provably accurate predicted trajectories. The optimal rates only depends on intrinsic dimension of interactions, which is typically much smaller than the ambient dimension. Our arguments are based on a coercivity condition which ensures that the interaction kernels can be estimated in stable fashion. The numerical algorithm presented to build the estimators is parallelizable, performs well on high-dimensional problems, and its performance is tested on a variety of complex dynamical systems.
more » « less
Full Text Available
On the Sparsity of LASSO Minimizers in Sparse Data Recovery

https://doi.org/10.1007/s00365-022-09594-1

Foucart, Simon; Tadmor, Eitan; Zhong, Ming (April 2023, Constructive Approximation)

Full Text Available
ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision

https://doi.org/10.18653/v1/2023.findings-acl.767

Zhong, Ming; Ouyang, Siru; Jiang, Minhao; Hu, Vivian; Jiao, Yizhu; Wang, Xuan; Han, Jiawei (July 2023, Association for Computational Linguistics)

Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient training data poses an obstacle to the progress of related models in this domain. In this paper, we propose REACTIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions. Additionally, we adopt synthetic data from patent records as distant supervision to incorporate domain knowledge into the model. Experiments demonstrate that REACTIE achieves substantial improvements and outperforms all existing baselines.
more » « less
Full Text Available
Towards Saner Deep Image Registration

Duan, Bin; Zhong, Ming; Yan, Yan (January 2023, International Conference on Computer Vision)

Full Text Available
Unsupervised Event Chain Mining from Multiple Documents

https://doi.org/10.1145/3543507.3583295

Jiao, Yizhu; Zhong, Ming; Shen, Jiaming; Zhang, Yunyi; Zhang, Chao; Han, Jiawei (April 2023, ACM)
Proc. 2023 The Web Conf. (Ed.)
Massive and fast-evolving news articles keep emerging on the web. To efectively summarize and provide concise insights into real-world events, we propose a new event knowledge extraction task Event Chain Mining in this paper. Given multiple documents abouta super event, it aims to mine a series of salient events in temporal order. For example, the event chain of super event Mexico Earthquake in 2017 is {earthquake hit Mexico, destroy houses, kill people,block roads}. This task can help readers capture the gist of textsquickly, thereby improving reading efciency and deepening text comprehension. To address this task, we regard an event as a cluster of diferent mentions of similar meanings. In this way, we can identify the diferent expressions of events, enrich their semantic knowledge and replenish relation information among them. Taking events as the basic unit, we present a novel unsupervised framework, EMiner. Specifcally, we extract event mentions from texts and merge them with similar meanings into a cluster as a single event. By jointly incorporating both content and commonsense, essential events are then selected and arranged chronologically to form an event chain. Meanwhile, we annotate a multi-document benchmark to build a comprehensive testbed for the proposed task. Extensive experiments are conducted to verify the efectiveness of EMiner in terms of both automatic and human evaluations.
more » « less
Full Text Available
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction

https://doi.org/10.18653/v1/2023.emnlp-main.620

Jiao, Yizhu; Zhong, Ming; Li, Sha; Zhao, Ruining; Ouyang, Siru; Ji, Heng; Han, Jiawei (January 2023, Association for Computational Linguistics)

Full Text Available

« Prev Next »

Search for: All records